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Abstract. In this paper, we consider the generalized min-sum set cover problem, intro- 
duced by Azar, Gamzu, and Yin IT]. Bansal, Gupta, and Krishnaswamy |2j give a 485- 
approximation algorithm for the problem. We are able to alter their algorithm and analysis 



(<^». ' to obtain a 28-approximation algorithm, improving the performance guarantee by an order 

^», 1 ^ of magnitude. We use concepts from a-point scheduling to obtain our improvements. 



jys . 1. Introduction 

^^ . In this note, we consider the generalized min-sum set cover problem. In this problem we 

j^ ' are given as input a universe U of n elements, a collection S = {5*1, ... , Sm} of subsets 

O . Si of U, and a covering requirement K{S) for each S & S, where K{S) G {1, 2, . . . , |5| }. 

The output of any algorithm for the problem is an ordering of the n elements. Let Cs be 

the position of the K{S)th element of S in the ordering. The goal is to find an ordering that 

^ ' minimizes J^seS ^s- ^^^^ problem is a generalization of the min-sum set cover problem 

f^ . (in which K(S) = 1 for all S e 5), introduced by Feige, Lovasz, and Tetali ||3], and the 

SIJ ' min-latency set cover problem (in which K{S) = \S\ for all 5' £ S), introduced by Hassin 

*vj ■ and Levin [4|. This generalization was introduced by Azar, Gamzu, and Yin [11 in the 

context of a ranking problem. 

Because the problem is NP-hard, Azar, Gamzu, and Yin give an 0(log r)-approximation 
algorithm for the problem, where r — xnsxses \S\- This was improved to a constant fac- 
tor randomized approximation algorithm by Bansal, Gupta, and Krishnaswamy ||2l- They 
introduce a new linear programming relaxation for the problem and show how to use ran- 
domized rounding to achieve a performance guarantee of 485lj In this paper, we show that 
^% ' by altering their algorithm using some concepts from a-point scheduling (see Skutella (]6l 

for a survey), we are able to improve their algorithm and obtain a performance guarantee 
of about 28, which is an order of magnitude bettero 

We now briefly review their algorithm and analysis, and then state the ideas we intro- 
duce to obtain an improvement. Their algorithm begins with solving the following linear 
programming relaxation of the problem, where the variable ys.t for t e \n] (here and in the 
following the set {1, . . . , n} is denoted by [n]) and set S* G tS indicates whether Cs < t 
or not, and x^^t for e £ U and t G [n] indicates whether element e is assigned to the tth 
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position of the ordering: 

min XI Xl(^ -ys,t) 
te\n]Ses 



s.t. 



y^ Xe,t — 1, for all t G [n], 

eeLf 

yj Xe,t = 1, for all e €U, 



te[n] 
J2 II^;^,*' >(ir(^)-|A|)-ys,t, forall5e5, AC5, te[n], 

eeS\At'<t 

Xe,t,ys\t e [0, 1], for all eeU, S eS,te [n]. 

Bansal et al. observe that the exponentially many constraints can be separated in polyno- 
mial time such that the linear program can be solved efficiently. Let x* , y* be an optimal 
solution. The algorithm proceeds in a sequence of [log n] stages. In the ith stage, the 
algorithm of Bansal et al. computes a probability pe ^ :— min{l, 8 X]t<2» ^t t} for ^'^ch 
element e G C/ by taking the amount that element e is fractionally scheduled up to time 2* 
and boosting it by a factor of 8. With probability p^^i it includes element e in a set Oi. 
If \Oi\ > 16 • 2*, the algorithm randomly chooses 16 • 2* elements from Oi and discards 
the remainder from Oi. For each i, the algorithm picks an arbitrary order for the elements 
in Oi, then schedules the elements in the order Oi , O2, . . . , Opiog „] ■ Notice that it is pos- 
sible that an element will appear in more than one Oi and is scheduled more than once; 
one can compute an ordering that keeps only the first occurrence of each element. 

The analysis of Bansal et al. works by looking at a time t*g for each 5 G 5, which is the 
smallest t such that j/J ^ > 1/2. The analysis then shows that for any stage i with t*g < 2\ 
with probability at least 1 — e^^ at least K{S) elements have been marked in stage i and no 
elements are discarded from Oi. From this, the analysis infers that i?[Cs] < 64 • -^^ ■ t*g. 
Since the value of the linear program is at least i X^ses *S' '^^e paper derives that the 
expected value of the solution is at most 128 • ^^ ~ 484.4 times the value of the linear 
program. 

While we still use several ideas from their algorithm and analysis, we modify it in 
several key ways. In particular, we discard the idea of stages, and we use the idea of 
a random a-point for each element e; in particular, after modifying the solution x* to a 
solution X in a way similar to theirs, we then randomly choose a value a^ G [0, 1] for 
each e <E U. Let te.a^ be the first time t for which X]t'=i ^e.t' > cte- We then schedule 
elements e in the order of nondecreasing ie,ae- The improvements in analysis come from 
scrapping the stages (so we don't need to account for the possibility of Oi being too large) 
and using a-point scheduling; in particular, we introduce a parameter a and look for the 
last point in time ts.a in which y*g^ < a (the Bansal et al. paper uses a = 1/2). Choosing 
a randomly gives our ultimate result. We turn to the full analysis in the next section. 

2. The Algorithm and Analysis 

Let X* , y* be an optimum solution to the linear program above. Let Q > be a constant 
determined later. Construct a new solution x from x* as follows: Initialize x := Q ■ x*; for 

t = 1 to [n/2j set 

Xe,2t ■= Xe.2t + Xe.t ■ 

Lemma 1. For each t G [n] 

t 

(1) J2 J2 ^'=.*' < 2 • • i • 
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Moreover, for each e E U and t < [n/2j 

2t t 

(2) Yl ^e..t'>Qj2xy, 

t'=t+l f' = l 

and for each t e [n] 



t 



(3) Y. ^-^*' ^ Q E 

t'=i t'=i 

Proof. We start by giving an alternative view on the definition of x above. Notice that 

(4) xe^t' ^Q Y ^*,t" with I{t') := {t" : t' = 2' • t" for some i > 0}. 

t"e/(t') 
That is, I{t') is precisely the subset of indices t" such that x* ^„ contributes to x^.f- For 
a fixed i G [n] and t" < t, let J{t") be the subset of all indices t' < t such that x* j„ 
contributes to Xe,t', i- e., J(t") = {f < t : t" e /(t')}- ^Y definition of I{t') and J(t") 
wegetX;t'=i \I{t')\ = Et"=i k(<")l- Also notice that |J(i")| = 1+ Llog(iA")J- Thus, 

^EE-M' = E E E<- = E 1^(01 = ^1-^(01 



= i + V Llog(i/i")J < i + / Llog(i/^)J de 
t"=i -^0 



This concludes the proof of ([T]i- 

In order to prove (|2]l, simply notice that for each t" G {1, . . . , i} there is t' E {t + 
1,.. . ,2t} such that t" G /(t'); then © follows from (g]). Finally, © also follows from 
© since t' G /(i') for all t'. D 

Algorithm: As discussed above, for each e G C/ we independently choose ae G [0, 1] 
randomly and uniformly. Let te,a^ denote the first point in time t when J2t'=i ^e.t' > cte- 
In our final solution, we sequence the elements e G C/ in order of nondecreasing ie,Qe ' ties 
are broken arbitrarily. 

For 5 G 5 and some fixed a G (0, 1), let ts.a be the last point in time t for which 
y'g I < a. We observe that the contribution of set S to the objective function of the linear 
program is 



:= ^(1-^5,*)= / ts,o.' 



(5) C|^:= >J(l-y5,t)= / ts,o.da. 

teln] 

since in time step t it holds that ts^a > t for all a G [0, 1] such that a > y*g i, or for 
(1 — y*g j) of the possible a. 

We now bound the probability that we have fewer than K{S) elements from S with 
ie.oe 5: is.a in tcrms of Q and a. 
Lemma 2. Suppose Q ■ a> \. Define p such that 

p:=exp(^4.(l-^)^Q..)<l. 

Then for integer i > 0, 

Pr [|{e G S : te.a^ < 2' ' ^S,a}| < i^(5)] < p'+' . 
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Proof. Our analysis follows some of the analysis of Bansal et al. for a stage. Let 
A:= {e(^S : > x^v >l 



Us: Y. 

\ t'<2>-ts 



Then observe that for any e G yl it holds that Pr[ie,Qc < 2* ■ ^5 „] — 1. By the properties 
of the linear program, 

^ ^ <,, > {K{S) - \A\) ■ yS,i+,,,„ > {K{S) - \A\) ■ a , 

ei^S\At'<ts,a, 

SO that by ^ 

eeS\At'<ts,c 

More generally, it follows from induction on i and (O and ^, that 

E E a.e,t' > (* + 1) • (if (^) - 1^1) • Q ■ a . 

eeS\At'<2^-ts,c, 

For any e G S* \ A, let random variable X^ be 1 if te.Q^ < 2* • t^c and otherwise. Note 
that Pr[Xe = 1] = Et'<2-.ts,„ ^e,t'- Let X := J2e&s\A ^e and fi ■- E[X], so that 

/i = S[X]= E E ^e,t' >(* + !) •(if(^)-|A|)- Q -a. 

eGS\A t'<2»-ts,c« 

Note that if \A\ > K{S), then Pr [\{e e S : te.oe < 2* ■ is,„}| < ii:(S')] = 0, so we 
assume that |^| < K{S). Then 

Pr[|{eG^:ie,a. <2^-ts,a}|<A'(5)] 

= Pr [|{e G 5- \ ^ : te.a, < T ■ ts,a}\ < K{S) - \A\] 
= Py[X <K{S)-\A\] 



< Pr 



X < 



{i + l)-Q-a 



= Pr 






X < // • 1 - 1 - - — — — — 



•M 



(i + 1) -Q-a 



^^^P -^-(^-Q^ 



(i + 1) • Q • a ^ p'+i 



where we use the Chernoff bound Pr[X < /x • (1 — /3)] < exp(— i ■ [3'^ ■ n) (see, for 
example, Motwani and Raghavan JS] Section 4.1]), and the fact that 



- 1- 



1 



ii + l)-Q-a 



<- 1- 



Q ■ a 



for i > and Q ■ a > 1. 



D 



Let Cs be a random variable giving the position of the K{S)th element of S in the 
ordering we construct, and let Cg^ be the contribution of set 5 to the objective function as 
defined in (|5]l. Then we can bound the cost of our schedule as follows, where OPTlp = 
J2ses ^s^ ^'^'^ OPT is the cost of an optimal schedule. 
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Lemma 3. IfQ and a are chosen such that p < 1/2, then 



^ T-^ • T-^ • OPTlp + OPT . 
1 — a 1 — 2p 



Proof. Let ts be the first point in time when \{e £ S : t^.a^ < ts}\ > K{S). Then by 
Lemma |2] we know that the probability that ts.a < ts < 2 ■ ts.a is at most p, since the 
probability that ts > ts.a is at most p by itself. Similarly, the probability that 2 • ts^a < 
ts < 4: ■ ts.a is at most p^, the probability that 4 • ts^a < ts < S ■ ts^a is at most p^, and 
so on, so that 

(6) E[ts] < ts^a + ts,a Y. 2' • p^+1 = ts,a ' [l + J^) = ^^■°' ' T^ ' 

Note that for all t < ts,a it holds that 1 - yj ^ > 1 - a, so that C|^ > ts,a{^ - a), or 
ts,a < C|^/(l - a). Thus 



E[is] < C^ 



LP 1 1-P 



1 - a 1 - 2p 

Observe that Cs < |{e £ U\S : te,a, < ts}\+K{S). Note that for any fixed element 
e ^ S and time t, the probability that te,a^ < t is min{l, J2t'<t ^e.t'}, so that 

E [\{e eU\S: t,,a. < t}\] <J2J2 ^-■*' < 2Q • * 

eeU t'<t 

by ([T]i. Then we have that 

(7) E[Cs] < 2Q ■ E[ts] + K{S) < -^ • -^f ■ C|^ + K{S) , 

1 — a 1 — 2p 

from which it follows that 

e\j:cs 



s 



^ T^ ■ 1-^ ■ OPTlp + OPT , 
1 — a 1 ~ 2p 



since in any solution J2ses ^i^) — OPT. D 

We try to tune the various parameters to obtain the best possible performance guarantee. 
If we set a := 1/2 (as did Bansal et al. 0) and Q := 10.05, then p = 0.1995, and thus 
we obtain 

^^1^ + 1.54.54. 

1 — a 1 — 2p 

for a performance guarantee of about 55 . However, we can do better if we choose a (and Q) 
randomly. 

Theorem 1. If we choose a independently at random from (0, 1) according to the density 
function /(a) = 2 ■ a and set Q := z/afor some appropriately chosen constant z, then 
the algorithm has performance guarantee less than 27.78. 

Proof. Notice that a • Q is equal to the fixed constant z and p — exp (— ^'(l — 7) ■ z) 
depends only on z and is thus constant. 

In the proof of Lemma [3] we have obtained bounds on the expectations of ts and Cs 
under the assumption that the values of a and Q are fixed. We refer to these conditional 
expectations by Eq, such that 

1 -p 

Eq Us] <ts a ■ z ;:;- due to ^, and 

I — 2p 

Ea [Cs] <2QEa [ts] + K{S) due to Q. 
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Unconditioning together with ^ then yields 

E[Cs]= / f{a)-E^[Cs]da 
Jo 



< / 2a-2Q-ts.a 



1-p 
l-2p 



da + K{S) 



Thus, we get 



.ses 



< Az- 



= Az 



4z 



1-p 
l-2p 



1^ 


-p 


1- 


~2p 


1- 


-P 



ts,a da + K{S) 



l-2p 



C 



LP 



K{S) 



OPTlp + OPT < 1 + 4z 



I-P 
l-2p 



OPT . 



If we set z := 5.03, then p « 0.1990, and the performance guarantee is less than 27.78. D 
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